Systems and methods for the determination of a copy number of a genomic sequence

ABSTRACT

System and methods for the determination of a copy number of a target genomic sequence; either a target gene or genomic sequence of interest, in a biological sample are described. Various methods utilize a model drawn from a probability density function (PDF) for the assignment of a copy number of a target genomic sequence in a biological sample. Additionally, the methods provide for the determination of a confidence value for a copy number assigned to a sample based on attributes of the sample data. Additionally, various embodiments of an interactive graphical user interface (GUI) may provide an end-user with ready analysis of large sets of data representing a plurality of samples. In various embodiments of an interactive GUI, an end-user may be provided with a synchronized display of tabular and graphical sample data determined by an initial analysis according to a statistical model of a PDF. Such a synchronized display may enable an end-user to readily identify sample data for a subsequent analysis based on user input.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority to U.S. Provisional Patent ApplicationSer. No. 61/564,503 filed Nov. 29, 2011, which disclosure is hereinincorporated by reference in its entirety.

FIELD

The field of disclosure relates to systems and methods for thedetermination of a copy number of a genomic sequence in a biologicalsample.

BACKGROUND

The polymerase chain reaction (PCR) represents an extensive family ofchemistries that have produced numerous types of assays of impact inbiological analysis. Accordingly, concomitant to the innovation ofassays for this family of chemistries has been the innovation ofcomputational methods matched to the objectives of the various PCR-basedassays.

For example, one type of computational method suited to various types ofquantitative PCR (qPCR) assays is often referred to as the comparativethreshold cycle (C_(t)) method. As one of ordinary skill in the art isapprised, the cycle threshold, C_(t), indicates the cycle number atwhich an amplified target genomic sequence; either a gene or genomicsequence of interest, reaches a fixed threshold. A relativeconcentration of a target genomic sequence; either a gene or genomicsequence of interest, may be determined using C_(t), determinations forthe target genomic sequence, a reference genomic sequence; of which formany qPCR assays may be either an endogenous or exogenous referencegenomic sequence, and additionally, a calibrator sequence. Afternormalizing the C_(t), data for the target gene sequence and thecalibrator gene sequence to the reference gene sequence samples, underthe assumption that the efficiencies of the reactions are equal andessentially 100%, one of ordinary skill in the art would recognize thecalculation for the comparative C_(t) method as:

X _(N,t) /X _(N,c)=2^(−ΔΔCt);

where

X_(N,t)/X_(N,c)=is the relative concentration of the target incomparison to the calibrator; and

ΔΔC_(t)=is the normalized difference in threshold cycles for the targetand the calibrator.

In practice, the efficiency of the PCR process may not be exactly 100%,as the concentration of genetic material may not double at every cycle.Factors that may affect the efficiency of an amplification reaction mayinclude, for example, reaction conditions such as the difference in thedetection limit for the dye used for a target genomic sequence versusthe dye for the reference, or in inherent differences in the sequencecontext of the target genomic sequence and a reference genomic sequence.However, as assays are optimized to ensure the highest efficiencies, anydeviations from the assumption of 100% efficiency are generally small.In addition to possible deviations from ideality, there are variationsof replicate samples of the same sequence, due to variationscontributions in an assay system from both the chemistry andinstrumentation.

Accordingly, various embodiments of systems and methods for thedetermination of a gene copy number according to the present teachingsuse statistical models of a probability distribution function (PDF) toassign a copy number to a sample in a population of samples, anddetermine a confidence value to the assignment. Such methods take intoaccount various assay deviations and variations. Unlike the comparativeC_(t) method, or ΔΔC_(t) method, as it is often referred, variousmethods for the determination of a gene copy number utilize theinformation in ΔC_(t) determinations of samples, and therefore do notrequire the use of a calibration sample data.

In various embodiments of systems and methods for the determination of agene copy number according to the present teachings, various embodimentsof an interactive graphical user interface (GUI) may provide an end-userwith ready viewing and interactive analysis of large sets of datarepresenting a plurality of samples. In various embodiments of aninteractive GUI according to the present teachings, an end-user may beprovided with a synchronized display of tabular and graphical sampledata. Such a synchronized display may enable an end-user to readilyidentify sample data for a subsequent analysis based on user input.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram that illustrates components of an exemplarycomputer system that may be utilized in the control and interface of asystem used for processing biological samples for qPCR.

FIG. 2 is a block diagram of an example of some instrument features thatmay be useful in the processing of biological samples for qPCR.

FIG. 3; is an exemplary probability sample frequency population aftervarious embodiments of methods for the determination of gene copynumber, which was estimated using a probability density function modelbased on a normal distribution.

FIG. 4 is representation of an input/output diagram for variousembodiments of an interactive GUI for the analysis of copy number for abiological sample.

FIG. 5 is a flow chart depicting various embodiments of an interactiveGUI for the analysis of copy number for a biological sample.

FIG. 6 is a flow chart depicting various embodiments of an interactiveGUI for the analysis of copy number for a biological sample.

FIG. 7 depicts synchronized graphic and tabular sample results forvarious embodiments of an interactive GUI for the analysis of copynumber for a biological sample.

FIG. 8 is a flow chart that depicts various embodiments for plotting anddisplaying a probability density scatter plot according to the presentteachings.

FIG. 9A and FIG. 9B depict embodiments of a probability distributionscatter plot that may be utilized in various embodiments of aninteractive GUI for the analysis of a copy number for a biologicalsample.

FIG. 10 depicts synchronized user input for sub-distribution populationinformation and tabular sample data for various embodiments of aninteractive GUI for the analysis of copy number for a biological sample.

FIG. 11 depicts synchronized graphic and tabular sample results forvarious embodiments of an interactive GUI for the analysis of copynumber for a biological sample.

DETAILED DESCRIPTION

What is disclosed herein are various embodiments of systems and methodsfor the determination of a copy number of a target genomic sequence;either a target gene or genomic sequence of interest, in a biologicalsample. In various embodiments of an interactive GUI according to thepresent teachings, a synchronized display of tabular and graphicalsample data may enable an end-user to readily and effectively view andanalyze large sets of sample data. According to the present teachings,various embodiments of an interactive GUI may display synchronizedgraphical and tabular results for each sample in a plurality of samplesbased on a model drawn from a probability density function (PDF). Invarious embodiments of an interactive GUI according to the presentteachings, such a graphical display may include a probability densityscatter plot, which allows an end-user to view and query a sample in aset of samples in a discrete interval or bin of a probability densityscatter plot. Additionally, various embodiments provide for thedetermination of a confidence value for a copy number assigned to asample based on attributes of the sample data. Accordingly, a confidencevalue so determined may provide for an independent evaluation of theassigned copy number generated using a PDF model. Various embodiments ofan interactive GUI according to the present teachings may provide forend-user input that includes selection of groupings of sub-distributionsin PDF in order to address potential issues of low confidence values forsamples falling in copy number sub-distributions having, for example,but not limited by, high sample variability.

The type of assay that is used to provide the data for variousembodiments of methods for the determination of a copy number is knownto one of ordinary skill in the art as the real-time quantitativepolymerase chain reaction (real-time qPCR), in which nucleic acidpresent in a sample may be amplified.

According to various embodiments, the term “amplified”, “amplifying”,“amplification” and related terms may refer to any process thatincreases the amount of a desired nucleic acid. Any of a variety ofknown amplification procedures may be employed in the present teachings,including PCR (see for example U.S. Pat. No. 4,683,202), as well as anyof a variety of ligation-mediated approaches, including LDR and LCR (seefor example U.S. Pat. No. 5,494,810, U.S. Pat. No. 5,830,711, U.S. Pat.No. 6,054,564). Some other amplification procedures include isothermalapproaches such as rolling circle amplification and helicase-dependantamplification. One of skill in art will readily appreciate a variety ofpossible amplification procedures applicable in the context of thepresent teachings. For example, in some embodiments, the amplificationmay comprise a PCR comprising a real-time detection, using for example alabeling probe.

The term “labeling probe” generally, according to various embodiments,refers to a molecule used in an amplification reaction, typically forquantitative or real-time PCR analysis, as well as end-point analysis.Such labeling probes may be used to monitor the amplification of thetarget polynucleotide. In some embodiments, oligonucleotide probespresent in an amplification reaction are suitable for monitoring theamount of amplicon(s) produced as a function of time. Sucholigonucleotide probes include, but are not limited to, the5′-exonuclease assay TaqMan® probes described herein (see also U.S. Pat.No. 5,538,848), various stem-loop molecular beacons (see e.g., U.S. Pat.Nos. 6,103,476 and 5,925,517 and Tyagi and Kramer, 1996, NatureBiotechnology 14:303-308), stemless or linear beacons (see, e.g., WO99/21881), PNA Molecular Beacons™ (see, e.g., U.S. Pat. Nos. 6,355,421and 6,593,091), linear PNA beacons (see, e.g., Kubista et al., 2001,SPIE 4264:53-58), non-FRET probes (see, e.g., U.S. Pat. No. 6,150,097),Sunrise®/Amplifluor® probes (U.S. Pat. No. 6,548,250), stem-loop andduplex Scorpion™ probes (Solinas et al., 2001, Nucleic Acids Research29:E96 and U.S. Pat. No. 6,589,743), bulge loop probes (U.S. Pat. No.6,590,091), pseudo knot probes (U.S. Pat. No. 6,589,250), cyclicons(U.S. Pat. No. 6,383,752), MGB Eclipse™ probe (Epoch Biosciences),hairpin probes (U.S. Pat. No. 6,596,490), peptide nucleic acid (PNA)light-up probes, self-assembled nanoparticle probes, andferrocene-modified probes described, for example, in U.S. Pat. No.6,485,901; Mhlanga et al., 2001, Methods 25:463-471; Whitcombe et al.,1999, Nature Biotechnology 17:804-807; Isacsson et al., 2000, MolecularCell Probes 14:321-328; Svanvik et al., 2000, Anal Biochem. 281:26-35;Wolffs et al., 2001, Biotechniques 766:769-771; Tsourkas et al., 2002,Nucleic Acids Research 30:4208-4215; Riccelli et al., 2002, NucleicAcids Research 30:4088-4093; Zhang et al., 2002 Shanghai 34:329-332;Maxwell et al., 2002, J. Am. Chem. Soc. 124:9606-9612; Broude et al.,2002, Trends Biotechnol. 20:249-56; Huang et al., 2002, Chem Res.Toxicol. 15:118-126; and Yu et al., 2001, J. Am. Chem. Soc14:11155-11161. Labeling probes can also comprise black hole quenchers(Biosearch), Iowa Black (IDT), QSY quencher (Molecular Probes), andDabsyl and Dabcel sulfonate/carboxylate Quenchers (Epoch). Labelingprobes can also comprise two probes, wherein for example a fluorophoreis on one probe, and a quencher on the other, wherein hybridization ofthe two probes together on a target quenches the signal, or whereinhybridization on target alters the signal signature via a change influorescence. Labeling probes can also comprise sulfonate derivatives offluorescein dyes with a sulfonic acid group instead of the carboxylategroup, phosphoramidite forms of fluorescein, phosphoramidite forms of CY5 (available for example from Amersham). In some embodiments,intercalating labels are used such as ethidium bromide, SYBR® Green I(Molecular Probes), and PicoGreen® (Molecular Probes), thereby allowingvisualization in real-time, or end point, of an amplification product inthe absence of a labeling probe.

As will be discussed in more detail subsequently, various embodiments ofsystems and methods may utilize detector signal data collected for aplurality of samples for a copy number assay. Such signals may be storedin a variety of computer readable media. In various embodimentsaccording to the present teachings, a computer program product may beprovided, which may include a tangible computer-readable storage mediumwhose contents include a program with instructions that when executed ona processor perform a method for providing an end-user with the abilityto sequentially and rapidly analyze and evaluate the sample data.

FIG. 1 is a block diagram that illustrates a computer system 100 thatmay be employed to carry out processing functionality, according tovarious embodiments, upon which embodiments of the present teachings maybe implemented. Computing system 100 can include one or more processors,such as a processor 104. Processor 104 can be implemented using ageneral or special purpose processing engine such as, for example, amicroprocessor, controller or other control logic. In this example,processor 104 is connected to a bus 102 or other communication medium.

Further, it should be appreciated that a computing system 100 of FIG. 1may be embodied in any of a number of forms, such as a rack-mountedcomputer, mainframe, supercomputer, server, client, a desktop computer,a laptop computer, a tablet computer, hand-held computing device (e.g.,PDA, cell phone, smart phone, palmtop, etc.), cluster grid, netbook,embedded systems, or any other type of special or general purposecomputing device as may be desirable or appropriate for a givenapplication or environment. Additionally, a computing system 100 caninclude a conventional network system including a client/serverenvironment and one or more database servers, or integration withLIS/LIMS infrastructure. A number of conventional network systems,including a local area network (LAN) or a wide area network (WAN), andincluding wireless and/or wired components, are known in the art.Additionally, client/server environments, database servers, and networksare well documented in the art.

Computing system 100 may include bus 102 or other communicationmechanism for communicating information, and processor 104 coupled withbus 102 for processing information.

Computing system 100 also includes a memory 106, which can be a randomaccess memory (RAM) or other dynamic memory, coupled to bus 102 forstoring instructions to be executed by processor 104. Memory 106 alsomay be used for storing temporary variables or other intermediateinformation during execution of instructions to be executed by processor104. Computing system 100 further includes a read only memory (ROM) 108or other static storage device coupled to bus 102 for storing staticinformation and instructions for processor 104.

Computing system 100 may also include a storage device 110, such as amagnetic disk, optical disk, or solid state drive (SSD) are provided andcoupled to bus 102 for storing information and instructions. Storagedevice 110 may include a media drive and a removable storage interface.A media drive may include a drive or other mechanism to support fixed orremovable storage media, such as a hard disk drive, a floppy disk drive,a magnetic tape drive, an optical disk drive, a CD or DVD drive (R orRW), flash drive, or other removable or fixed media drive. As theseexamples illustrate, the storage media may include a computer-readablestorage medium having stored therein particular computer software,instructions, and/or data.

In alternative embodiments, storage device 110 may include other similarinstrumentalities for allowing computer programs or other instructionsor data to be loaded into computing system 100. Such instrumentalitiesmay include, for example, a removable storage unit and an interface,such as a program cartridge and cartridge interface, a removable memory(for example, a flash memory or other removable memory module) andmemory slot, and other removable storage units and interfaces that allowsoftware and data to be transferred from the storage device 110 tocomputing system 100.

Computing system 100 can also include a communications interface 118.Communications interface 118 can be used to allow software and data tobe transferred between computing system 100 and external devices.Examples of communications interface 118 can include a modem, a networkinterface (such as an Ethernet or other NIC card), a communications port(such as for example, a USB port, a RS-232C serial port), a PCMCIA slotand card, Bluetooth, and the like. Software and data transferred viacommunications interface 118 are in the form of signals which can beelectronic, electromagnetic, optical or other signals capable of beingreceived by communications interface 118. These signals may betransmitted and received by communications interface 118 via a channelsuch as a wireless medium, wire or cable, fiber optics, or othercommunications medium. Some examples of a channel include a phone line,a cellular phone link, an RF link, a network interface, a local or widearea network, and other communications channels.

Computing system 100 may be in communication through communicationsinterface 118 to a display 112, such as a cathode ray tube (CRT), liquidcrystal display (LCD), and light-emitting diode (LED) display fordisplaying information to a computer user. In various embodiments,computing system 100 may be couple to a display through a bus. An inputdevice 114, including alphanumeric and other keys, is coupled to bus 102for communicating information and command selections to processor 104,for example. An input device may also be a display, such as an LCDdisplay, configured with touch screen input capabilities. Another typeof user input device is cursor control 116, such as a mouse, a trackballor cursor direction keys for communicating direction information andcommand selections to processor 104 and for controlling cursor movementon display 112. This input device typically has two degrees of freedomin two axes, a first axis (e.g., x) and a second axis (e.g., y), thatallows the device to specify positions in a plane. A computing system100 provides data processing and provides a level of confidence for suchdata. Consistent with certain implementations of embodiments of thepresent teachings, data processing and confidence values are provided bycomputing system 100 in response to processor 104 executing one or moresequences of one or more instructions contained in memory 106. Suchinstructions may be read into memory 106 from another computer-readablemedium, such as storage device 110. Execution of the sequences ofinstructions contained in memory 106 causes processor 104 to perform theprocess states described herein. Alternatively hard-wired circuitry maybe used in place of or in combination with software instructions toimplement embodiments of the present teachings. Thus implementations ofembodiments of the present teachings are not limited to any specificcombination of hardware circuitry and software.

The term “computer-readable medium” and “computer program product” asused herein generally refers to any media that is involved in providingone or more sequences or one or more instructions to processor 104 forexecution. Such instructions, generally referred to as “computer programcode” (which may be grouped in the form of computer programs or othergroupings), when executed, enable the computing system 100 to performfeatures or functions of embodiments of the present invention. These andother forms of computer-readable media may take many forms, includingbut not limited to, non-volatile media, volatile media, and transmissionmedia. Non-volatile media includes, for example, solid state, optical ormagnetic disks, such as storage device 110. Volatile media includesdynamic memory, such as memory 106. Transmission media includes coaxialcables, copper wire, and fiber optics, including connectivity to bus102.

Common forms of computer-readable media include, for example, a floppydisk, a flexible disk, hard disk, magnetic tape, or any other magneticmedium, a CD-ROM, any other optical medium, punch cards, paper tape, anyother physical medium with patterns of holes, a RAM, PROM, and EPROM, aFLASH-EPROM, any other memory chip or cartridge, a carrier wave asdescribed hereinafter, or any other medium from which a computer canread.

Various forms of computer readable media may be involved in carrying oneor more sequences of one or more instructions to processor 104 forexecution. For example, the instructions may initially be carried onmagnetic disk of a remote computer. The remote computer can load theinstructions into its dynamic memory and send the instructions over atelephone line using a modem. A modem local to computing system 100 canreceive the data on the telephone line and use an infra-red transmitterto convert the data to an infra-red signal. An infra-red detectorcoupled to bus 102 can receive the data carried in the infra-red signaland place the data on bus 102. Bus 102 carries the data to memory 106,from which processor 104 retrieves and executes the instructions. Theinstructions received by memory 106 may optionally be stored on storagedevice 110 either before or after execution by processor 104.

Those skilled in the art will recognize that the operations of thevarious embodiments may be implemented using hardware, software,firmware, or combinations thereof, as appropriate. For example, someprocesses can be carried out using processors or other digital circuitryunder the control of software, firmware, or hard-wired logic. (The term“logic” herein refers to fixed hardware, programmable logic and/or anappropriate combination thereof, as would be recognized by one skilledin the art to carry out the recited functions.) Software and firmwarecan be stored on computer-readable media. Some other processes can beimplemented using analog circuitry, as is well known to one of ordinaryskill in the art. Additionally, memory or other storage, as well ascommunication components, may be employed in embodiments of theinvention.

It will be appreciated that, for clarity purposes, the above descriptionhas described embodiments of the invention with reference to differentfunctional units and processors. However, it will be apparent that anysuitable distribution of functionality between different functionalunits, processors or domains may be used without detracting from theinvention. For example, functionality illustrated to be performed byseparate processors or controllers may be performed by the sameprocessor or controller. Hence, references to specific functional unitsare only to be seen as references to suitable means for providing thedescribed functionality, rather than indicative of a strict logical orphysical structure or organization.

Various embodiments of systems and methods for the determination of acopy number according to the present teachings may utilize variousembodiments of a cycler instrument as depicted in the block diagramshown in FIG. 2. As shown in FIG. 2, a thermal cycling instrument mayinclude a heated cover 214 that is placed over a plurality of samples216 contained in a sample support device. In various embodiments, asample support device may be a glass, metal or plastic slide orsubstrate with a plurality of sample regions, which sample regions havea cover between the sample regions and heated cover 214. Some examplesof a sample support device may include, but are not limited by, amulti-well plate, such as a standard microtiter 96-well, a 384-wellplate, a micro device capable of processing thousands of samples peranalysis or a microcard, or a substantially planar support, such asvarious microfluidic devices, microcard devices, and micro chip devicesfabricated from, for example, but not limited by, a glass, metal orplastic slide or substrate. The sample regions in various embodiments ofa sample support device may include depressions, indentations, holes,ridges, and combinations thereof, patterned in regular or irregulararrays formed on the surface of the slide or substrate. Variousembodiments of a thermal cycler instrument may include a sample block218, elements for heating and cooling 220, and a heat exchanger 222.

Various embodiments of a thermal cycler instrument can process multiplesamples simultaneously, and may be used in the generation andacquisition of copy number assay data. In FIG. 2, various embodiments ofa thermal cycling system 200 provide a detection system for the run timeacquisition of signals for each sample in a plurality of biologicalsamples, over the entirety time that a copy number assay is performed. Adetection system may have an illumination source that emitselectromagnetic energy, and a detector or imager 210, for receivingelectromagnetic energy from samples 216 in sample support device.

A control system 224 may be used to control the functions of thedetection, heated cover, and thermal block assembly. The control systemmay be accessible to an end-user through user interface 226 of thermalcycler instrument 200. A computer system 100, as depicted in FIG. 1 mayserve as to provide the control the function of a thermal cyclerinstrument, as well as the user interface function. Additionally,computer system 100 may provide data processing, display and reportpreparation functions. All such instrument control functions may bededicated locally to the thermal cycler instrument, or computer system100 may provide remote control of part or all of the control, analysis,and reporting functions.

As previously described, a large volume of copy number data may begenerated as detector signal data is collected over the entirety of adefined time for thermal cycling for each of a large number of samplesanalyzed during the same run. Given the large volume of data collectedover any given copy number analysis, various embodiments of systems andmethods of the present teachings provide for embodiments of computerreadable media that may generate processed data from initial copy numberassay data collected for each sample in a sample support device.

Additionally, various embodiments of systems and methods of the presentteachings provide for embodiments of computer readable media that mayallow an end-user the flexibility to dynamically analyze large datasets, and selected subsets thereof, using an interactive user interface.Such an interactive user interface may assist an end-user in selectionof, for example, but not limited by, a new set of analysis parameters,another method by which the data may be analyzed, the review of data forselected replicate sets of data, as well as the associated statisticsfor the replicate sets, and the review of which sets of data sets mayfall within a selected threshold in comparison to a target set ofsamples.

FIG. 3 shows an exemplary determination of copy number for a set ofsamples using various embodiments of a previously described probabilitydistribution frequency (PDF) statistical method, US 2010/0228496, whichis herein incorporated by reference in its entirety.

As is apparent from inspection of FIG. 3, the PDF for this sample setcovers the range from copy numbers (CNs) 1 to 5. A characteristic of theexemplary distribution of FIG. 3 is that the separation between the meanof the CN sub-distributions decrease as CN increases. This is a directconsequence of the logarithmic relationship between ΔC_(t) and theconcentration of genomic material within the context of PCR, as will bediscussed in more detail subsequently. As a result of the decreasingseparation between sub-distributions with increasing CN, the variabilityof ΔC_(t) values has a larger impact on the resolution of the higher CNvalues. As measurement variability increases, the average confidence ofhigher CN calls will decrease much faster than confidence values forlower CN's. Additionally, as will be discussed in more detailsubsequently, the relative probability of a copy number, the P_(CN), caninfluence the confidence value associated with a call. An approximatetrend is that the confidence of copy calls increases with increases inthe frequency of samples belonging to that CN group. Various embodimentsmay be used to specify optimum ΔC_(t) decision boundaries for CN valueassignment. As is depicted in FIG. 3, it is apparent that theseboundaries should be placed at the minimum PDF values between the peaksof the PDF since, to either side of these boundaries there is a largerlikelihood that the CN corresponds to that of the closer peak in thePDF.

FIG. 4 depicts an input/output diagram depicting various embodiments ofsystems and methods for the determination of a copy number. Variousembodiments of systems and methods according to the present teachingsprovide for embodiments an interactive graphical user interface (GUI),which may provide an end-user the ability to dynamically analyze largedata sets for copy number determination. As will be discussed in moredetail subsequently, and as depicted in FIG. 4, various embodiments ofan interactive GUI may provide for end-user input regarding selection ofsub-distribution groupings or sub-groupings to be analyzed as acollective group. As previously discussed, the variability of ΔC_(t)values may have a larger impact on the resolution of the higher CNvalues; thereby impacting a confidence value that may be associated witha sample for which a higher CN value has been initially assigned. Forvarious embodiments of an interactive GUI according to the presentteachings, an end-user, for example, may wish to know an assignment ofCNs to any sample for CNs 1-2, and then anything above CN 2 as acollective group. By way of another example, an end-user may wish toassign copy numbers to samples for CNs 1, 2, 3, and 4, specifically, butmay wish to further assign a CN for sample falling within a range of CN5-7, and then any sample falling above a CN greater than 7. In thisregard, for various embodiments of an interactive GUI of the presentteachings, an end-user knowing that a confidence value may be low due toa sample for which a single value of CN value may be assigned, can inputlimits and ranges on copy numbers. In this fashion, end-user inputdesignating grouping of sub-distributions provides for samples to beanalyzed as a collective group. As will be discussed subsequently, sucha collective group analysis may increase the confidence value associatedwith the collective group PDF.

As depicted in FIG. 4, and depicted in step 310 of FIGS. 5 and 410 ofFIG. 6, for various embodiments of systems and methods of the presentteachings, a set of copy number data for each sample is received by aprocessor. As indicated in FIG. 4, the data for each sample may includethe output from the instrument detector, as well as information abouteach sample provided in the plate setup created by an end-user. FIG. 5and FIG. 6 depict various embodiments of systems and methods for thedetermination of copy number. In step 320 of FIG. 5, an end-user mayinput information regarding partitioning of copy numbers intosub-distribution groupings or sub-groupings as previously described.Such information may provide ranges and limits for copy numbers. Invarious embodiments of FIG. 5, and depicted in step 320 of FIG. 5, theuser input may be provided before a determination of a CN for a sampleis done, as indicated in step 330 of FIG. 5, and before reviewinginformation provided by a synchronized GUI display of graphic andtabular data is provided to the end-user, as depicted in step 340. Invarious embodiments of FIG. 6, and as depicted in step 440, the end-userinput may be provided after a determination of a CN for a sample isdone, as indicated in step 420 of FIG. 6, and after end-user review ofinformation provided by a synchronized GUI display of graphic andtabular data as depicted in step 430 of FIG. 6 is done. As one ofordinary skill in the art is apprised, combinations of various systemsand methods as depicted in FIG. 5 and FIG. 6 are possible. For example,but not limited by, an end-user may input the information regardingsub-groupings before a CN determination is done and also after reviewinginformation provided by a synchronized GUI display of graphic andtabular data. Further, as indicated in FIG. 6 for method 400, review ofinformation provided by a synchronized GUI display of graphic andtabular data and input provided by the end-user may be an iterativeprocess for various embodiments of systems and methods for thedetermination of a copy number according to the present teachings.

Regarding steps 330 of FIG. 5 for method 300, and step 420 and 450 ofFIG. 6 for method 400, determination CN and confidence value may be donebased on the determination of ΔC_(t) for each sample.

By way of providing an overview of the calculations for ΔC_(t) andΔΔC_(t) for a copy number assay, the calculation of ΔC_(t) values from adata set is based on the equation for the progress of reaction for a PCRassay. It is well know that for a PCR reactions the equation describingthe exponential amplification of PCR is given by:

X _(n) =X _(o)[(1+E _(X))^(n)]  (1)

where:

X_(n)=the number of target molecules at cycle n

X_(o)=the initial number of target molecules

E_(X)=the efficiency of the target amplification

n=the number of cycles

from that relationship, the concentration of a genomic sequence at thethreshold is:

X _(Ct,x) =X _(o)[(1+E _(X))^(Ct,x) =K _(X)  (2)

where:

X_(Ct,x)=the number or target molecules at C_(t)

X_(o)=the initial number of target molecules

E_(X)=the efficiency of the target amplification

C_(t,x)=the number of cycles at C_(t)

K_(X)=a constant

From this it is evident that for a target genomic sequence; either agene or genomic sequence of interest, the concentration of target formedin the reaction at C_(t) is a constant K, and therefore characteristicof the reaction. Generally, K may vary for various target genomicsequences, due to a number of reaction variables, such as, for examplethe reporter dye used in a probe, the efficiency of the probe cleavage,and the setting of the detection threshold. Additionally, as previouslydescribed, is generally held that the assumption that the efficienciesof reactions are optimized and essentially the same. Under suchconditions and assumptions, it can be shown through the algebraicmanipulation of EQ. 2, that normalizing a target genomic sequence ofinterest of to an endogenous reference reaction at C_(t) yields thefollowing relationship:

X _(N) =K[(1+E)^(−ΔCt)]  (3)

where:

X_(N)=is the normalized amount of the target

ΔC_(t)=is the difference in threshold cycles for the target andendogenous reference genomic sequence

Further, it should be noted that for the comparative C_(t) method, orΔΔC_(t) method, that the relative concentration of a target genomicsequence to a calibrator is:

X _(N,t) /X _(N,c)=(1+E)^(−ΔΔCt)  (4)

where

X_(N,t)/X_(N,c)=is the relative concentration of the target relative tothe calibrator; and

ΔΔC_(t)=is the normalized difference in threshold cycles for the targetand the calibrator

Then, as previously mentioned, as assays are optimized to ensure amaximum in the reaction efficiency, or an efficiency of 1, then EQ. 4simplifies to the calculation known to one of ordinary skill in the artfor the comparative C_(t) method previously given:

X _(N,t) /X _(N,c)=2^(−ΔΔCt)  (5)

According to various embodiments, an equation for copy number as afunction of ΔC_(t) data generated from qPCR assays having a monomodalPDF sub-distribution for each copy number cn with mean, μ_(ΔCt)(cn), isconstrained to be described as:

μ_(ΔCt)(cn)=K−log_((1+E))(cn)  (6)

where:

μ_(ΔCt)(cn)=is the mean of the ΔC_(t) sub-distributions as a function ofcopy number; where cn is a non-zero positive integer

K=is a constant; and

log_((1+E))(cn)=the log to the base (1+E) of copy number cn where E isthe efficiency of the PCR amplification of the gene of interest

as a result of EQ. 2, where, as previously described, variation inΔC_(t) data around μ_(ΔCt)(cn) may arise within and between samples withthe same copy number due to various factors such as, for example, butnot limited by, thermal fluctuations in the thermal cycler, and bindingbehaviors of PCR primers and probes. In various embodiments, anexemplary PDF model may be a normal distribution and, in this case, thefull PDF model can be directly characterized by μ_(ΔCt)(cn), K, E, thesample variance, a, and the probability of each copy number. Thoughthese parameters directly characterize a PDF using the exemplary normaldistribution, it should be understood that any mono-modal distributionPDF may be used, for example, but are not limited by, the Burr, Cauchy,Laplace, and logistic distributions. A central consideration forselection of a distribution function is that the mean of the PDF isconstrained to follow EQ. 6. Accordingly, it should be understood thatvarious mono-modal PDFs, such as, but not limited by, the normal, theBurr, Cauchy, Laplace, and logistic distributions may have differentsets of parameters that characterize such model PDF distributions.

Additionally, after the set of sample sub-distribution populationsincluded in the sample frequency distribution have copy numbersassigned, thereby assigning copy numbers to every sample included ineach sample sub-distribution, a confidence value for every sample in thesample frequency distribution may be determined.

According to various embodiments, the confidence that the assigned copynumber is the true copy number within the assumption that the PDF modelis accurate may be described most generally by the probability that thisis so as described in the following equation:

$\begin{matrix}\begin{matrix}{{P\left( {{cn}_{assigned} = {cn}_{true}} \right)} = {P\left( {cn}_{assigned} \middle| {\Delta \; C_{r}^{\prime}s} \right)}} \\{= {{P\left( {\Delta \; {Ct}_{r}^{\prime}s} \middle| {cn}_{assigned} \right)}{{P\left( {cn}_{assigned} \right)}/{P\left( {\Delta \; {Ct}_{r}^{\prime}s} \right)}}}} \\{= \frac{\prod\limits_{{cn}_{assigned}}{F\left( {{\Delta \; {Ct}_{r}^{\prime}s};{cn}_{assigned}} \right)}}{\sum\limits_{cn}{\Pi_{cn}{F\left( {{\Delta \; {Ct}_{r}^{\prime}s};{cn}} \right)}}}}\end{matrix} & \left( {{EQ}.\mspace{14mu} 6} \right) \\{\mspace{20mu} {{{where}{{\Delta \; {Ct}_{r}^{\prime}s\mspace{14mu} {refers}\mspace{14mu} {to}\mspace{14mu} {the}\mspace{14mu} {replicate}\mspace{14mu} {observations}\mspace{14mu} {for}\mspace{14mu} a\mspace{14mu} {given}\mspace{14mu} {person}},{and}}F\mspace{14mu} {is}\mspace{14mu} {the}\mspace{14mu} {probability}\mspace{14mu} {distribution}\mspace{14mu} {function}\mspace{14mu} {chosen}\mspace{14mu} {for}\mspace{14mu} {the}\mspace{14mu} {sub}\text{-}{distributions}\mspace{14mu} {that}\mspace{14mu} {is}\mspace{14mu} {constrained}\mspace{14mu} {by}\mspace{14mu} {requiring}\mspace{14mu} {that}\mspace{14mu} {its}\mspace{14mu} {mean}\mspace{14mu} {is}\mspace{14mu} {given}\mspace{14mu} {by}\text{:}}\mspace{20mu} {\mu_{cn} = {K - {\log_{({1 + E})}({cn})}}}\mspace{20mu} {\Pi_{cn}\mspace{14mu} {is}\mspace{14mu} {the}\mspace{14mu} {probability}\mspace{14mu} {of}\mspace{14mu} {copy}\mspace{14mu} {number}\mspace{14mu} {cn}}}} & \;\end{matrix}$

As exemplary, for various embodiments where F is assumed to be a normaldistribution, analyses taken from mathematical statistics can be used toproduce the following:

$\begin{matrix}{\mspace{20mu} {{{{P\left( {{cn}_{assigned} = {cn}_{true}} \right)} = {\left\lbrack {1 + {\sum\limits_{{cn} \neq {cn}_{a}}{\frac{\Pi_{cn}}{\Pi_{{cn}_{a}}}^{- \Omega}}}} \right\rbrack^{- 1}\mspace{14mu} {where}}}\mspace{20mu} {{subscript}\mspace{14mu} a\mspace{14mu} {is}\mspace{14mu} {shorthand}\mspace{14mu} {for}\mspace{14mu} {assigned}}\mspace{20mu} {\Pi_{cn}\mspace{14mu} {is}\mspace{14mu} {the}\mspace{11mu} {probability}\mspace{14mu} {of}\mspace{14mu} {copy}\mspace{14mu} {number}\mspace{14mu} {cn}}\mspace{20mu} {\Omega \equiv {\frac{1}{\sigma^{2\;}}{\log_{({1 + E})}\left( \frac{cn}{{cn}_{a}} \right)}\left( {\left( {{\hat{\mu}}_{r} - K} \right) + \frac{\log_{({1 + E})}\left( {{cn}_{a\;}{cn}} \right)}{2}} \right)}}}\mspace{20mu} {{{\hat{\mu}}_{r} = {\frac{1}{N_{r}}{\sum\limits_{\underset{{for}\mspace{14mu} a\mspace{14mu} {person}}{{all}\mspace{14mu} {replicates}}}{\Delta \; {Ct}_{r}}}}};{and}}{\sigma^{2} = {{the}\mspace{14mu} {variance}\mspace{14mu} {of}\mspace{14mu} {the}\mspace{14mu} {sub}\text{-}{distributions}\mspace{14mu} {for}\mspace{14mu} {each}\mspace{14mu} {copy}\mspace{14mu} {{number}.}}}}} & \left( {{EQ}.\mspace{14mu} 8} \right)\end{matrix}$

According to various embodiments, a confidence value may be determinedby first identifying the two sample sub-distributions having thegreatest number of samples, and determine the sub-distribution means forthe two populations. Such a mean would be the mean of replicate means,or the mean of {circumflex over (μ)}_(r) given above in EQ. 7. RecallingEQ. 6:

μ_(ΔCt)(cn)=K−log_((1+E))(cn)  (6)

where:

μ_(ΔCt)(cn)=is the mean of a of the ΔC_(t) sub-distributions as afunction of copy number; where cn is a non-zero integer

K=is a constant; and

log_((1+E))(cn)=the log to the base (1+E) of the copy number of a genein a sub-distribution of sample distributions, where E is the efficiencyof the PCR amplification

Then, for various embodiments, μ_(ΔCt)(cn) is estimated for the twopopulations having the greatest number of samples, yielding twoindependent equations, which may be used to solve for the two unknowns,K and E. Additionally, the variance for the mean of sample means,σ_(msm) may be determined, as well as Π_(cn) the probability of copynumber cn. In various embodiments, a distribution of probabilities thatthe assigned copy number is the true copy number may be generated usingthe parameters K, E, σ_(msm), and Π_(cn). According to variousembodiments, a Bootstrap technique may be used to generate such adistribution. In various embodiments, once the distribution of theprobability measure given by EQ. 7 using the Bootstrap technique isgenerated, then a confidence level may be selected for the EQ. 7probability measure. For example, in various embodiments a confidencelevel assuring that there is a 95% chance that the EQ. 7probability isequal to or higher than the value determined for this quantity. As willbe discussed in more detail subsequently, variables such as the numberof samples comprising a sub-population, the copy number, and samplevariance may all impact the degree to which high values for the EQ. 7probability can be achieved.

With respect to 330 of FIGS. 5 and 450 of FIG. 6, a confidence intervalfor a sample associated with a CN interval or limit may be given asfollows;

$\begin{matrix}{\mspace{20mu} {{{{P\left( {cn}_{m->n} \middle| {\hat{\mu}}_{r} \right)} = {\frac{{P\left( {\hat{\mu}}_{r} \middle| {cn}_{m->n} \right)}{P\left( {cn}_{m->n} \right)}}{P\left( {\hat{\mu}}_{r} \right)}\mspace{14mu} {where}}}\mspace{20mu} {{\hat{\mu}}_{r} = {\frac{1}{N_{r\;}}{\sum\limits_{\underset{{for}\mspace{14mu} a\mspace{14mu} {person}}{{all}\mspace{14mu} {replicates}}}{\Delta \; {Ct}_{r}}}}}\mspace{20mu} {{P\left( {\hat{\mu}}_{r} \right)} = {\sum\limits_{cn}{{P\left( {\hat{\mu}}_{r} \middle| {cn} \right)}{P({cn})}}}}}\mspace{20mu} {{P\left( {cn}_{m->n} \right)} = {\sum\limits_{{cn} = {m->n}}{\Pi_{cn}\mspace{14mu} {and}}}}{{P\left( {\hat{\mu}}_{r} \middle| {cn}_{m->n} \right)} = {\frac{1}{\sum\limits_{{cn} = {m->n}}\Pi_{cn}}{\sum\limits_{{cn} = {m->n}}{\Pi_{cn}{N\begin{pmatrix}{{K - {L\; \log ({cn})}},} \\\sqrt{\frac{\sigma_{r}^{2}}{N_{r\;}} + \sigma_{p}^{2}}\end{pmatrix}}}}}}}} & \left( {{EQ}.\mspace{14mu} 9} \right)\end{matrix}$

Then algebraically, the following is derived:

$\begin{matrix}{{{P\left( {cn}_{m->n} \middle| {\hat{\mu}}_{r} \right)} = {\frac{\sum\limits_{{cn} = {m->n}}F_{cn}}{\sum\limits_{cn}F_{cn}}\mspace{14mu} {where}}}{F_{cn} = {\Pi_{cn}^{{- {({{\hat{\mu}}_{r} - K + {L\; {lo}\; g\; {cn}}})}^{2}}/{({2{({\frac{\sigma_{r}^{2}}{N_{r}} + \sigma_{p}^{2}})}})}}}}} & \left( {{EQ}.\mspace{14mu} 10} \right)\end{matrix}$

For step 340 of FIG. 5 of method 300 and step 430 of FIG. 6 of method400, a synchronized display of graphical and tabular sample data may beprovided to an end-user via an interactive GUI. FIG. 7 depicts asynchronized display of graphical and tabular sample data according tovarious embodiments of systems and methods of the present teachings. Inthe bar chart of graph A, a single bar is highlighted, as is a singlepoint in the major sub-group of a probability distribution scatter plotC, which is synchronized with for a sample highlighted in assayselection table B and sample results table D. Bar chart A indicates at aglance that the sample selected in sample results table D has beenassigned a CN of 2, and additionally the major sub-group in probabilitydistribution plot C is shown to be associated with CN 2 in the legendfor probability distribution scatter plot C.

Various embodiments of a probability density scatter plot according tothe present teachings are depicted in method 500 of FIG. 8, as well asthe plots depicted in FIG. 9A and FIG. 9B. As one of ordinary skill inthe art is apprised, various probability density plots, for example, butnot limited by, a histogram plot, do not provide an end-user withinformation about any specific data point in each of a plurality ofdiscrete intervals or bins comprising a probability density plot. Oneobjective of various embodiments of an interactive GUI is to provide anend-user with ready access to a significant amount of information for aplurality of biological samples analyzed in a copy number assay. Suchready access of information provided by a GUI with a synchronizedviewing of graphical and tabular data for a sample may greatly assist anend-user with the efficient and timely analysis of such complex data.

In order to enhance the information that can be accessed by an end-userfrom a probability density plot, a probability density scatter plot wasdevised. As depicted in method 500 of FIG. 8, ΔC_(t) data for eachsample is received by a processor at step 510. As one of ordinary skillin the art of probability density plots is apprised, an abscissa may bedivided into discrete intervals or bins. Various teachings in the art ofprobability density plots give guidance on the determination of binwidth, or abscissa intervals, suitable for any particular data set. InFIG. 8, step 530 of method 500, a unique X, Y coordinate is determinedfor every data point representing a ΔC_(t) for each sample in aprobability density scatter plot. In that regard, in contrast toconventional probability density plots, various embodiments of aprobability density scatter plot provide for the display of individualdata points. For step 540 of FIG. 8, an ordinate scale of normalizedcount can be determined using a maximum Y value determined in step 530of FIG. 8. According to various embodiments of systems and methods ofthe present teachings, a probability density scatter plot may bedisplayed in a variety of ways, including as a part of an interactiveGUI, as shown in FIG. 7.

As depicted in FIG. 9A, which is the data as shown in FIG. 7, variousembodiments of a probability density plot provide ready informationabout sample distribution, while at the same time providing comparativeinformation regarding each sample. As depicted by the legend for FIG.9A, samples in group I have been assigned to a CN of 1, samples in groupII have been assigned to a CN of 2, samples in group III have beenassigned to a CN of 3, and samples in group IV have been assigned to aCN of 4. Additionally, a relative relationship among individual sampledata points of discrete intervals or bins is visually provided byembodiments of probability density plots of the present teachings.Moreover, as shown in the synchronized display of an interactive GUIdepicted in FIG. 7, an individual sample point of a probability densityscatter plot may be readily viewed by an end-user, along with anassociated wealth of related information. Though the discrete intervalsor bins are not readily visible in an embodiment of a probabilitydensity scatter plot depicted in FIG. 9A, the bins used to create theplot are shown in FIG. 9B. FIG. 9B provides clarity for how each samplepoint in each bin has been plotted according to coordinates determinedin step 530 of FIG. 8. Various embodiments of a probability densityscatter plot may provide the end-user with the display of bins, as shownin FIG. 9B.

FIG. 10 and FIG. 11 depict various embodiments of an interactive GUIrelating to user input and display, as depicted in steps 320, 340 ofmethod 300 of FIG. 5 and step 440, 430 of method 400 of FIG. 6.,respectively.

In FIG. 10, windows A, B, and C depict various ways that an end-user mayselect sub-distribution groupings for analysis. According to variousembodiments, an end-user may select the number of discrete intervals orbins for a set of biological samples analyzed using a copy number assay.For various embodiments of an interactive GUI, a single value for CN, aninterval for CN or a limit for CN may be selected. As depicted in datahighlighted in box D, an end-user may inspect the impact to a confidencevalue generated for a single value of CN assigned, or for anysub-distribution grouping selected by an end-user as shown in windows A,B, and C of FIG. 10. Accordingly, various embodiments of an interactiveGUI of the present teaching may provide an end-user a dynamic andefficient tool for assessing sample data generated for a copy numberassay.

FIG. 11 depicts a synchronized interactive GUI according variousembodiments of systems and methods of the present teachings. As isevident by inspecting table A of FIG. 11, the sample summary indicted inthe box is a summary of a single CN value assigned and associatedconfidence value for a sample in comparison to end-user input ofsub-distribution grouping. As seen in window B of FIG. 11, an end-userselection of 3 discrete intervals or bins has been entered forcomparison to a single CN value and associated confidence value estimategenerated for a set of biological samples analyzed for copy number. Inprobability density scatter plot C, the data is displayed for ananalysis assigning a single value for CN, in which 4 sub-distributiongroupings are shown. In probability density scatter plot D, a pass/faildesignation for each sample shown in plot C is displayed. As can beenseen by inspection, a plurality of samples in sub-distribution groupingsabove a CN of 3 have failed meeting an acceptable confidence valueestimate. In probability density scatter plot E, the data analysis hasbeen run according to the end-user input shown in window B, in which allsamples for CN of 3 and above have been grouped together for analysis.As can be seen in probability density scatter plot F, which is apass/fail designation for each sample shown in plot D, clearly all thefailing samples are now associated with a passing confidence valueestimate. The synchronized display of various embodiments of aninteractive GUI of the present teachings may display the tabular datacoordinated with the graphic display. As can be seen in table A of FIG.11 for a sample designated as PSC362, the numeric values for CN andconfidence value estimate can be clearly seen for the assignment of asingle value for CN versus the end-user designated sub-distributiongroupings.

It should be noted that various embodiments of an interactive GUIaccording to the present teachings may utilize combinations of color andshape to indicate various attributes of the sample data being displayed,as is shown, for example, FIG. 7 and FIG. 11.

While the principles of this invention have been described in connectionwith specific embodiments, it should be understood clearly that thesedescriptions are made only by way of example and are not intended tolimit the scope of the invention. What has been disclosed herein hasbeen provided for the purposes of illustration and description. It isnot intended to be exhaustive or to limit what is disclosed to theprecise forms described. Many modifications and variations will beapparent to the practitioner skilled in the art. What is disclosed waschosen and described in order to best explain the principles andpractical application of the disclosed embodiments of the art described,thereby enabling others skilled in the art to understand the variousembodiments and various modifications that are suited to the particularuse contemplated. It is intended that the scope of what is disclosed bedefined by the following claims and their equivalence.

What is claimed is:
 1. A system comprising: a processor; and a memory incommunication with the processor; the memory storing instructions for:receiving by the processor a data set of data for a copy number assayfor a plurality of samples, determining a ΔC_(t), a copy number and aconfidence value for each sample in the plurality of samples; andpresenting an end user with an interface for the interactive analysis ofthe copy number and the confidence value for each sample.
 2. The systemof claim 1, wherein the interactive interface comprises a synchronizeddisplay of graphical and tabular results for the plurality of samples.3. The system of claim 2, where a graphical display of results for theplurality of samples is a probability density scatter plot.
 4. Thesystem of claim 1, wherein the interactive analysis of the copy numberand the confidence value for each sample comprises end-user selection ofsub-distribution grouping.